Skip to content

fix(generic_files): flint_sprintf on 32-bit glibc (closes #2646)#2648

Merged
fredrik-johansson merged 3 commits into
flintlib:mainfrom
edgarcosta:fix/2646-armhf-tests
Apr 28, 2026
Merged

fix(generic_files): flint_sprintf on 32-bit glibc (closes #2646)#2648
fredrik-johansson merged 3 commits into
flintlib:mainfrom
edgarcosta:fix/2646-armhf-tests

Conversation

@edgarcosta
Copy link
Copy Markdown
Member

Attempt to fix #2646.

On 32-bit glibc (i386, armhf), vsnprintf(dst, n, fmt, …) with n ≳ 16 MB silently drops everything after the first character. flint_vsprintf was routing through flint_vsnprintf(s, INT_MAX, …), so every flint_sprintf("x%wd", k) produced "x" instead of "x<k>".

fmpz_mpoly_set_str_pretty builds variable names with this exact call, so every variable collapsed to "x". The parser then failed prefix matching, returned -1, and left the polynomial malformed, which is why the three deterministic tests in #2646 fail on 32-bit:

  • mpoly_test_irreducible: FAIL: check 8 variable example
  • fmpz_mpoly_compose_fmpz_mpoly: Check non-example 1
  • nmod_mpoly_compose_nmod_mpoly: Check non-example 1

Fix

Give flint_sprintf / flint_vsprintf their own sink in a new src/generic_files/io_vsprintf.c that calls system vsprintf directly. sprintf semantics already require the caller to provide a sufficiently large buffer, so no length bound is needed and the glibc edge case is avoided. flint_snprintf is unchanged.

Potential alternative fix:

A smaller change that also fixes the bug is a 6-line diff in flint_vsnprintf_vprintf itself:

if (avail > ((size_t) 1 << 16))
    res = vsprintf(dst, fmt, ap_copy);
else
    res = vsnprintf(dst, avail, fmt, ap_copy);

Pros: no new file, no duplicated sink boilerplate (~120 fewer lines).

Cons: subtle behavior change for flint_snprintf callers passing n > 64 KB, they no longer get truncation at n-1. In practice this path was already broken on 32-bit glibc (the very bug we're fixing), and anyone using snprintf with n > 64 KB is effectively using it as sprintf. But it does cross the architectural line between bounded and unbounded writes. Also, the 1 << 16 threshold is a hardcoded constant tuned to the observed glibc behavior, which is not great: it's not principled, and a future glibc change could shift the threshold without us noticing.

I went with the separate-sink version because it preserves snprintf's contract exactly and avoids the hardcoded threshold. Happy to switch on request.

Test plan

  • New flint_sprintf regression test (src/test/t-io.c) covering %wd round-trip, WORD_MIN/WORD_MAX, and mixed %wd %wu %wx. With the fix reverted, it reports the exact failure: flint_sprintf("x%wd", 1) gave "x" expected "x1" on i386.
  • Three tests from Test failures on 32-bit ARM build of FLINT 3.5.0 #2646 pass on i386 native (-m32).
  • Full make check passes on i386 (-m32) and amd64.
  • flint_sprintf, mpoly_test_irreducible, fmpz_mpoly_compose_fmpz_mpoly, nmod_mpoly_compose_nmod_mpoly pass on armhf under qemu-arm-static. (qemu-user did not reproduce the original failure, only real 32-bit glibc does, but confirms no regression.)
  • Real armhf hardware verification: would appreciate a re-run by @d-torrance on Debian armhf to close out Test failures on 32-bit ARM build of FLINT 3.5.0 #2646.

`flint_vsprintf` previously routed through `flint_vsnprintf` with a size of
`INT_MAX`, but on 32-bit glibc `vsnprintf(dst, n, ...)` silently drops output
past the first character once `n` exceeds about 16 MB. The result on i386 and
armhf was that `flint_sprintf("x%wd", 1)` produced `"x"` instead of `"x1"`.

This broke `fmpz_mpoly_set_str_pretty` (which builds variable names via
`flint_sprintf("x%wd", i + 1)`): every variable became the literal string `"x"`,
the parser then failed prefix matching, returned `-1`, and left the polynomial
malformed. Tests that exercise the parser then saw a zero/garbage polynomial
where they expected a deterministic input — the symptom reported in flintlib#2646:

    mpoly_test_irreducible            FAIL: check 8 variable example
    fmpz_mpoly_compose_fmpz_mpoly     Check non-example 1
    nmod_mpoly_compose_nmod_mpoly     Check non-example 1

Fix: give `flint_sprintf` its own sink in a new `io_vsprintf.c` that calls the
system `vsprintf` directly. `sprintf` semantics already require the caller to
provide a sufficiently large buffer, so no length bound is needed and the
glibc edge case is avoided.

Also add a regression test that catches the precise failure mode
(`flint_sprintf("x%wd", n)` round-trip plus a few `WORD_MIN/MAX` cases and a
mixed-`%w` format) and is registered in `src/test/main.c`. Verified by
reverting only the `io_vsnprintf.c`/`io_vsprintf.c` changes: the new test
reports `flint_sprintf("x%wd", 1) gave "x" expected "x1"` on i386, then passes
once the fix is restored. Full `make check` passes on i386, amd64, and armhf
(under qemu-arm-static).

Closes flintlib#2646.
Caught by the MinGW64 (LLP64) CI run on the previous commit (141266a):
slong on Windows is `long long` (64-bit) but `long` is 32-bit, so the
expected-value computation `snprintf(expected, ..., "%ld", (long) values[ix])`
truncated WORD_MIN/WORD_MAX to 0 and the test reported a false failure
"flint_sprintf(\"[%wd]\", -9223372036854775808) gave ... expected \"[0]\"".

Cast to `long long` and use `%lld` instead, which fits slong on every
supported platform (slong is `long` on LP64 and `long long` on LLP64).
@albinahlback
Copy link
Copy Markdown
Collaborator

Oh nice, and nice that you added a test file as well! I can confirm that changing INT_MAX to something reasonable fixes it on cfarm26

@d-torrance
Copy link
Copy Markdown
Contributor

This fixed the tests from #2646 on the Debian armhf porterbox!

However, I'm getting a new test failure I didn't see earlier. Could this be related to the changes?

gr_poly_log_series...
FAIL

Ring of 3 x 3 matrices over Rational field (fmpq)
n = 5
a = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]] + [[0, 0, -508],
[0, -406/3, 0],
[0, -13/6, 1/46]]*x^2 + [[0, 0, 0],
[0, 1, -8],
[-1/9, 0, -1]]*x^6 + [[0, 0, 0],
[-1000, 0, 0],
[0, 0, 0]]*x^7 + [[0, 0, 0],
[-1/460, 0, 0],
[0, 0, 0]]*x^9

b = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]] + [[0, 0, -2],
[1/89, -4797924355589564652257283/2, 0],
[0, 0, 1/512]]*x + [[0, -2/519, 0],
[-35740566643349127160, 59, 1],
[0, -1/32, 0]]*x^5 + [[4228641788/1751, 0, 0],
[-1, 0, 0],
[0, -1, 0]]*x^6 + [[0, -117/4, 0],
[0, -309485000602476631497375743/2166527304234035068851421184, 0],
[0, 51555860481/1051, 2]]*x^8 + [[-1, 0, 0],
[0, -16/143, 0],
[0, 936/553, 1]]*x^9

fa = [[0, 0, -508],
[0, -406/3, 0],
[0, -13/6, 1/46]]*x^2 + [[0, -1651/3, 127/23],
[0, -82418/9, 0],
[0, -242749/1656, -1/4232]]*x^4

fb = [[0, 0, -2],
[1/89, -4797924355589564652257283/2, 0],
[0, 0, 1/512]]*x + [[0, 0, 1/512],
[4797924355589564652257283/356, -23020078121959539233172234142846181028787226542089/8, 1/89],
[0, 0, -1/524288]]*x^2 + [[0, 0, -1/393216],
[7673359373986513077724078047615393676262408847363/356, -36816197829641385988107883389753964062591689754643694289852109548152094729/8, 1228268635030928550977864447/68352],
[0, 0, 1/402653184]]*x^3 + [[0, 0, 1/268435456],
[110448593488924157964323650169261892187775069263931082869556328644456284187/2848, -529923996741120226857499244772841880262055948334272200313980355417026060719203236203553404088483921/64, 1508643839800740363185175535557298684871671127684480257/46661632],
[0, 0, -1/274877906944]]*x^4

ab = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]] + [[0, 0, -2],
[1/89, -4797924355589564652257283/2, 0],
[0, 0, 1/512]]*x + [[0, 0, -508],
[0, -406/3, 0],
[0, -13/6, 1/46]]*x^2 + [[0, 0, -127/128],
[-406/267, 324659548061560541469409483, 0],
[-13/534, 20791005540888113493114893/4, 1/23552]]*x^3

fafb = [[0, 0, -2],
[1/89, -4797924355589564652257283/2, 0],
[0, 0, 1/512]]*x + [[0, 0, -260095/512],
[4797924355589564652257283/356, -69060234365878617699516702428538543086361679629515/24, 1/89],
[0, -13/6, 262121/12058624]]*x^2 + [[0, 0, -1/393216],
[7673359373986513077724078047615393676262408847363/356, -36816197829641385988107883389753964062591689754643694289852109548152094729/8, 1228268635030928550977864447/68352],
[0, 0, 1/402653184]]*x^3 + [[0, -1651/3, 34091302935/6174015488],
[110448593488924157964323650169261892187775069263931082869556328644456284187/2848, -4769315970670082041717493202955576922358503535008449802825823198753234546472829125831980636801630041/576, 1508643839800740363185175535557298684871671127684480257/46661632],
[0, -242749/1656, -34359738897/145410412773376]]*x^4

fab = [[0, 0, -2],
[1/89, -4797924355589564652257283/2, 0],
[0, 0, 1/512]]*x + [[0, 0, -260095/512],
[4797924355589564652257283/356, -69060234365878617699516702428538543086361679629515/24, 1/89],
[0, -13/6, 262121/12058624]]*x^2 + [[0, -26/9, -5720087/9043968],
[69060234365878617699516702428538543086361679623019/3204, -36816197829641385988107883389753964062591689754643694289852109548152094729/8, 1228268635030928550978124543/68352],
[-13/801, 15967492255402071162712237837/4608, 1/402653184]]*x^3 + [[-13/534, 5322497418467357054236849071/1024, 34091302935/6174015488],
[110448593488924157964323650169261892187775069258736530100571359980945732459/2848, -424469121389637301712856895063046346089906814615752032451498264689037874636081792199046276675345072401/51264, 34698808315417028353266385076272500853073524125272507159/1073217536],
[13/546816, -1101756965622742910258962007681/217055232, -34359738897/145410412773376]]*x^4

make: *** [Makefile:792: build/gr_poly/test/main_TEST_RUN] Aborted
make: *** Waiting for unfinished jobs....

@edgarcosta
Copy link
Copy Markdown
Member Author

edgarcosta commented Apr 28, 2026

@d-torrance Thank you! I will investigate. Could you specify in what kind of machine you are observing this?

The pre-existing failure was likely just hidden earlier because something further up the test list aborted first.

@d-torrance
Copy link
Copy Markdown
Contributor

This is on amdahl.debian.org, one of Debian's ARM porterboxes. The machine itself is 64-bit, but the tests were run in a 32-bit environment using schroot.

@edgarcosta
Copy link
Copy Markdown
Member Author

Here is my investigation:

The gr_poly_log_series failure is a pre-existing ARM-specific bug:

  • On i386 native (-m32), with my fix: gr_poly_log_series passes deterministically.
  • On armhf under qemu-arm-static, with my fix: gr_poly_log_series FAILs (SIGABRT, exit 134).
  • On armhf under qemu-arm-static, with my fix reverted to current upstream/main: gr_poly_log_series FAILs byte-identically (diff -q of the two outputs reports no difference). So this PR is not the cause.
  • grep -rn "flint_sprintf\|flint_snprintf\|flint_vsprintf\|flint_vsnprintf" src/gr_poly/ src/gr_mat/ src/gr/ returns nothing, so none of those modules go through the code I touched. It was likely just hidden earlier because something further up the test list aborted first.

I did some quick bisection. With FLINT_BITS=32 (same RNG seed on both 32-bit archs), iters 0-54 produce identical RNG state on i386 and armhf. At iter 55 both pick GR_CTX_NF (number field) via gr_ctx_init_random. After that single call, the RNG state has diverged: i386 and armhf consumed different numbers of n_randint calls inside the same code path. The number-field path goes through fmpz_poly_randtest_irreducible (src/gr/init_random.c:157) which loops on irreducibility tests, so the divergence is most likely upstream of gr_poly_log_series entirely, in fmpz_poly_factor or fmpz_mod_poly_randtest_irreducible on 32-bit ARM. The matrix-ring failure at iter 259 is just the random ring that gets picked after the divergence accumulates.

We should certainly open another issue. I will keep investigating regardless.

@albinahlback
Copy link
Copy Markdown
Collaborator

albinahlback commented Apr 28, 2026 via email

@fredrik-johansson fredrik-johansson merged commit 4a2ed06 into flintlib:main Apr 28, 2026
13 checks passed
@fredrik-johansson
Copy link
Copy Markdown
Collaborator

Looks good to me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failures on 32-bit ARM build of FLINT 3.5.0

4 participants